Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref: document import-url cloud versioning changes #4142

Merged
merged 7 commits into from
Dec 22, 2022

Conversation

pmrowla
Copy link
Contributor

@pmrowla pmrowla commented Nov 29, 2022

@pmrowla pmrowla self-assigned this Nov 29, 2022
@shcheklein shcheklein temporarily deployed to dvc-org-import-url-vers-fuikfw November 29, 2022 10:29 Inactive
@github-actions
Copy link
Contributor

github-actions bot commented Nov 29, 2022

Link Check Report

There were no links to check!

file. By default, DVC will automatically capture cloud versioning information
if the URL contains a cloud versioning ID. When `--version-aware` is provided
along with a URL that does not contain a cloud versioning ID, DVC will capture
the latest version of the file.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also explain that dvc will pull that version from the source location even if it's overwritten, and will not push another copy of it to the remote.

cc @jorgeorpinel Is there somewhere in the data management user guide we want to this info also?

Copy link
Contributor

@jorgeorpinel jorgeorpinel Dec 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll def. need UG updates to go over cloud versioning (feel free to make a separate docs issue) -- can't explain everything in an option text. For now I'd focus on what the flag does, and put some explanations in the Description (which in this case is already super long and should be rewritten/ moved to UG eventually).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgeorpinel jorgeorpinel added A: docs Area: user documentation (gatsby-theme-iterative) C: ref Content of /doc/*-reference labels Dec 3, 2022
Copy link
Contributor

@jorgeorpinel jorgeorpinel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Committing...

content/docs/user-guide/project-structure/dvc-files.md Outdated Show resolved Hide resolved
content/docs/user-guide/project-structure/dvcyaml-files.md Outdated Show resolved Hide resolved
@shcheklein shcheklein temporarily deployed to dvc-org-import-url-vers-fuikfw December 3, 2022 23:21 Inactive
@@ -265,6 +265,7 @@ These include a subset of the fields in `.dvc` file
| `persist` | Whether the output file/dir should remain in place during `dvc repro` (`false` by default: outputs are deleted when `dvc repro` starts) |
| `checkpoint` | (Optional) Set to `true` to let DVC know that this output is associated with [checkpoint experiments](/doc/user-guide/experiment-management/checkpoints). These outputs are reverted to their last cached version at `dvc exp run` and also `persist` during the stage execution. |
| `desc` | (Optional) User description for this output. This doesn't affect any DVC operations. |
| `push` | Whether or not this file or directory, when previously <abbr>cached</abbr>, is uploaded to remote storage by `dvc push` (`true` by default). |
Copy link
Contributor

@jorgeorpinel jorgeorpinel Dec 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Echoing iterative/dvc#8581 (comment):

Should we plan to recommend this a lot in Data Pipeline docs? Specifically for intermediate pipeline outputs. Assuming the happy path out there is to push only raw data and likely final ML model files (everything else may be best to dvc repro when needed).

If we don't at least emphasize the possibility, users may realize too late they have pushed a bunch of intermediate output versions and they are pretty difficult to clean up with dvc gc (support example).

Copy link
Contributor Author

@pmrowla pmrowla Dec 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that not pushing is the right default behavior, even for intermediate outputs. If the user wants to take advantage of run-cache to not re-run stages that have already been reproduced, they still need to push/pull intermediate outs

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorgeorpinel Thinking about it some more, I like the suggestion and think it makes sense as a possible product direction to make it easier to get started with pipelines, so let's brainstorm more on it.

@dberenbaum
Copy link
Collaborator

Related to #4089

@shcheklein shcheklein temporarily deployed to dvc-org-import-url-vers-fuikfw December 6, 2022 06:40 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-import-url-vers-fuikfw December 6, 2022 08:42 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-import-url-vers-fuikfw December 6, 2022 10:33 Inactive
Comment on lines 51 to 54
For stages created with `dvc import-url` and a cloud-versioned URL, `--rev`
can be used to specify a object version ID to use. By default, the import will
be updated to the latest version from cloud storage.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: link back to the --version-aware flag in import-url?

@shcheklein shcheklein temporarily deployed to dvc-org-import-url-vers-nbrzm5 December 9, 2022 04:43 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-import-url-vers-tcz3nx December 9, 2022 04:55 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-import-url-vers-xtrkfr December 9, 2022 05:09 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-import-url-vers-ftuefl December 9, 2022 05:21 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-import-url-vers-ihsb0r December 9, 2022 05:35 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-import-url-vers-wtvwhs December 9, 2022 05:48 Inactive
@shcheklein shcheklein temporarily deployed to dvc-org-import-url-vers-6ietae December 9, 2022 06:01 Inactive
@shcheklein
Copy link
Member

@dberenbaum @pmrowla qq - what is the status of this, and the cloud versioning in general?

@dberenbaum
Copy link
Collaborator

what is the status of this, and the cloud versioning in general?

It's on my plate to review and merge the outstanding docs PRs (here and #4165, and tracked more broadly in #4089). We discussed in sprint meeting and agreed we could merge now with an admon that it's experimental.

In Q1, the plan is to address performance issues and publicly promote it. Initial thoughts on blog post are in Notion.

@dberenbaum dberenbaum merged commit 90fea91 into main Dec 22, 2022
@dberenbaum dberenbaum deleted the import-url-versioning branch December 22, 2022 19:53
@jorgeorpinel jorgeorpinel mentioned this pull request Dec 23, 2022
1 task
shcheklein pushed a commit that referenced this pull request Feb 17, 2023
* guide: draft structure of Data Mgmt and
some updates around the topic in existing docs

* guide: full text for draft intro to DM

* guide: hide cloud versioning info
per #4042 (review)

* guide: clarify Data Mgmt parts and
add prospective figure titles

* guide: add figure drafts to Data Mgmt

* guide: SCM->VC (Data Mgmt)

* guide: update 2 figs and add 1 more (Data Mgmt)

* guide: roll back unrelated changes
per #4042 (review)

* guide: mention clouds first (DM) and

and update fig. 1
per #4042 (review)

* guide: flatten DM index
per #4042 (review)

* guide: udpates to DM/ DV
moved from #4053 (review)

* guide: add DM/ Data Versioning page

per #4042 (comment)

* guide: update outdated link

* guide: revert more unrelatedly chaqnged files

per #4042 (review)

* guide: remove unused ref link

* guide: DM/ Remote Storage (not just Setup) and

and some links from cmd refs
and avoid term "data remote"
and some admons nearby...

* guide: remove a comment

* guide: draft for DM/ Remote Storage content

* ref: expand config.remote and link to/from Remotes guide

* ref: fix remote config file examples

* guide: complete Remote Config section and

and add Project config section to DM/ DV guide

* ref: rewrite remote add and modify Descs

* guide: complete list of supported storage types

* ref: rewrite remote index page from

extracted from #4053

* guide: clarify `remote modify` phrase in

in the Remote config section of DM/ Remote Storage

* Update content/docs/user-guide/data-management/data-versioning.md

* guide: update versioning config

per #4058 (review)

* guide: don't call remote storage "additional" here

(in the DM/ Remote Storage guide)
per #4058 (review)

Co-authored-by: Dave Berenbaum <[email protected]>

* guide: pull -> download (DM/ RS intro)

* guide: remove "optional" from Remote Storage nav & title

per #4058 (review)

* guide: splits and notes around Data Mgmt index page

rel. #4042 (comment)

* guide: Data Mgmt intro + note updates

* guide: draft of all contents +

+ remove comments

* guide: small impros to Data Mgmt

in prep for #4042 (review)

* guide: rewrite Data Mgmt index in before/after form

per #4042 (review)

* guide: add draft figure for Data Mgmt

* guide: simplify/refocus data mgmt index

per #4042 (review)

* work around commented header bug

* guide: drop DM/ DV page

* guide: rewrite DM intro and

- hide benefits (for now)
- remove codification comment block

* guide: use DM table instead of figure for now

* guide: rewrite Data Mgmt story

* guide: add draft figures to Data Mgmt

* guide: simplify Data Mgmt story and benefits

* guide: remove unused images (DM)

* guide: update Data Mgmt figures (v1)

* guide: rewrite text of Data Mgmt index

* guide: update Data Mgmt figures

* guide: iterate on Data Mgmt again

* guide: update Data Mgmt figs

* guide: more supporting info about Data Mgmt

* guide: update figures (much more concrete) and

and matching text updates

* guide: edits to How it works (Data Mgmt)

* guide: update Data Mgmt figures

Rel. #4042 (comment)

* guide: emphaisze dataset versions in UG fig 1

Rel. #4042 (comment)

* guide: update Data Mgmt figures (with notes),

expand img captions,
and update text accordingly.

* guide: more updates to text and figure styles,

esp. to the first half
and comment some stuff out (temporary)

* guide: update figures and text (Data Mgmt) ...

Using a tabs toggle for the 2nd fig.

* guide: Data Management text (section 1)

finalized for this version of figures

* guide: Data Management (main text)

finalized for this version of figures

* guide: Data Management (secondary text)

pending diagram and code sample(s)

* guide: add DVC data mgmt technical diagram &

dummy sample CLI blocks

* guide: update Data Mgmt text

* guide: udpate text and 2nd figure (Data Mgmt)

* guide: draft 2nd and 3rd figures

* guide: rewrite Data Mgmt/ How it works &

and Benefits/ Tradeoffs

Probably still unfinished... Missing more data versioning info? See HTML comments.

* guide: update drafts of Data Mgmt figures 2, 3

* guide: Data Mgmt improvements and

hide the benefits list for now

* guide: separate from Data Mgmt work

Rel. #4042

* Apply suggestions from code review

* Merge branch main +

* ref: bring cloud versioning copy edits of import-url

from
https://github.com/iterative/dvc.org/pull/4260/files#diff-ef95e18c4bd039757695065a23946dc27e28b4727ce07c670cdc096e34dbe3b3

* ref: clarify import-url with cloud versioning

per #4142 (review)

* ref: updates to import-url --version-aware and

update --rev

* ref: add import-url --version aware to Synopsis

per #4089 (comment)

* Restyled by prettier (#4266)

Co-authored-by: Restyled.io <[email protected]>

* ref: updates around worktree updates (cloud versioning)

* ref: link from `remote` (index) to storage types

* guide: roll back changes to dvc.yaml `rev` field spec

* Update content/docs/command-reference/update.md

* guide: link refs in .dvc file spec

* Restyled by prettier (#4319)

Co-authored-by: Restyled.io <[email protected]>

* Update content/docs/command-reference/update.md

---------

Co-authored-by: Dave Berenbaum <[email protected]>
Co-authored-by: rogermparent <[email protected]>
Co-authored-by: restyled-io[bot] <32688539+restyled-io[bot]@users.noreply.github.com>
Co-authored-by: Restyled.io <[email protected]>
shcheklein pushed a commit that referenced this pull request Mar 9, 2023
* guide: draft structure of Data Mgmt and
some updates around the topic in existing docs

* guide: full text for draft intro to DM

* guide: hide cloud versioning info
per #4042 (review)

* guide: clarify Data Mgmt parts and
add prospective figure titles

* guide: add figure drafts to Data Mgmt

* guide: SCM->VC (Data Mgmt)

* guide: update 2 figs and add 1 more (Data Mgmt)

* guide: roll back unrelated changes
per #4042 (review)

* guide: mention clouds first (DM) and

and update fig. 1
per #4042 (review)

* guide: flatten DM index
per #4042 (review)

* guide: udpates to DM/ DV
moved from #4053 (review)

* guide: add DM/ Data Versioning page

per #4042 (comment)

* guide: update outdated link

* guide: revert more unrelatedly chaqnged files

per #4042 (review)

* guide: remove unused ref link

* guide: DM/ Remote Storage (not just Setup) and

and some links from cmd refs
and avoid term "data remote"
and some admons nearby...

* guide: remove a comment

* guide: draft for DM/ Remote Storage content

* ref: expand config.remote and link to/from Remotes guide

* ref: fix remote config file examples

* guide: complete Remote Config section and

and add Project config section to DM/ DV guide

* ref: rewrite remote add and modify Descs

* guide: complete list of supported storage types

* ref: rewrite remote index page from

extracted from #4053

* guide: clarify `remote modify` phrase in

in the Remote config section of DM/ Remote Storage

* Update content/docs/user-guide/data-management/data-versioning.md

* guide: update versioning config

per #4058 (review)

* guide: don't call remote storage "additional" here

(in the DM/ Remote Storage guide)
per #4058 (review)

Co-authored-by: Dave Berenbaum <[email protected]>

* guide: pull -> download (DM/ RS intro)

* guide: remove "optional" from Remote Storage nav & title

per #4058 (review)

* guide: splits and notes around Data Mgmt index page

rel. #4042 (comment)

* guide: Data Mgmt intro + note updates

* guide: draft of all contents +

+ remove comments

* guide: small impros to Data Mgmt

in prep for #4042 (review)

* guide: rewrite Data Mgmt index in before/after form

per #4042 (review)

* guide: add draft figure for Data Mgmt

* guide: simplify/refocus data mgmt index

per #4042 (review)

* work around commented header bug

* guide: drop DM/ DV page

* guide: rewrite DM intro and

- hide benefits (for now)
- remove codification comment block

* guide: use DM table instead of figure for now

* guide: rewrite Data Mgmt story

* guide: add draft figures to Data Mgmt

* guide: simplify Data Mgmt story and benefits

* guide: remove unused images (DM)

* guide: update Data Mgmt figures (v1)

* guide: rewrite text of Data Mgmt index

* guide: update Data Mgmt figures

* guide: iterate on Data Mgmt again

* guide: update Data Mgmt figs

* guide: more supporting info about Data Mgmt

* guide: update figures (much more concrete) and

and matching text updates

* guide: edits to How it works (Data Mgmt)

* guide: update Data Mgmt figures

Rel. #4042 (comment)

* guide: emphaisze dataset versions in UG fig 1

Rel. #4042 (comment)

* guide: update Data Mgmt figures (with notes),

expand img captions,
and update text accordingly.

* guide: more updates to text and figure styles,

esp. to the first half
and comment some stuff out (temporary)

* guide: update figures and text (Data Mgmt) ...

Using a tabs toggle for the 2nd fig.

* guide: Data Management text (section 1)

finalized for this version of figures

* guide: Data Management (main text)

finalized for this version of figures

* guide: Data Management (secondary text)

pending diagram and code sample(s)

* guide: add DVC data mgmt technical diagram &

dummy sample CLI blocks

* guide: update Data Mgmt text

* guide: udpate text and 2nd figure (Data Mgmt)

* guide: draft 2nd and 3rd figures

* guide: rewrite Data Mgmt/ How it works &

and Benefits/ Tradeoffs

Probably still unfinished... Missing more data versioning info? See HTML comments.

* guide: update drafts of Data Mgmt figures 2, 3

* guide: Data Mgmt improvements and

hide the benefits list for now

* guide: separate from Data Mgmt work

Rel. #4042

* Apply suggestions from code review

* Merge branch main +

* ref: update links from API to Remotes guide

* guide: update links around Remote Storage and

and other updates to nearby Markdown (e.g. proper admons)

* Roll back unrelated changes

* Restyled by prettier (#4261)

Co-authored-by: Restyled.io <[email protected]>

* ref: bring cloud versioning copy edits of import-url

from
https://github.com/iterative/dvc.org/pull/4260/files#diff-ef95e18c4bd039757695065a23946dc27e28b4727ce07c670cdc096e34dbe3b3

* ref: clarify import-url with cloud versioning

per #4142 (review)

* ref: updates to import-url --version-aware and

update --rev

* ref: add import-url --version aware to Synopsis

per #4089 (comment)

* Restyled by prettier (#4266)

Co-authored-by: Restyled.io <[email protected]>

* Restyled by prettier (#4322)

Co-authored-by: Restyled.io <[email protected]>

* Update content/docs/command-reference/remote/modify.md

Co-authored-by: Oded Messer <[email protected]>

* Update content/docs/command-reference/remote/modify.md

Co-authored-by: Oded Messer <[email protected]>

* Update content/docs/command-reference/push.md

Co-authored-by: Oded Messer <[email protected]>

* yarn format-all

---------

Co-authored-by: Dave Berenbaum <[email protected]>
Co-authored-by: rogermparent <[email protected]>
Co-authored-by: restyled-io[bot] <32688539+restyled-io[bot]@users.noreply.github.com>
Co-authored-by: Restyled.io <[email protected]>
Co-authored-by: Oded Messer <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation (gatsby-theme-iterative) C: ref Content of /doc/*-reference
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants